Two Languages Are More Informative Than One

نویسندگان

  • Ido Dagan
  • Alon Itai
  • Ulrike Schwall
چکیده

This paper presents a new approach for resolving lexical ambiguities in one language using statistical data on lexical relations in another language. This approach exploits the differences between mappings of words to senses in different languages. We concentrate on the problem of target word selection in machine translation, for which the approach is directly applicable, and employ a statistical model for the selection mechanism. The model was evaluated using two sets of Hebrew and German examples and was found to be very useful for disambiguation. 1 I n t r o d u c t i o n The resolution of hxical ambiguities in non-restricted text is one of the most difficult tasks of natural language processing. A related task in machine translation is target word se lect ion the task of deciding which target language word is the most appropriate equivalent of a source language word in context. In addition to the alternatives introduced from the different word senses of the source language word, the target language may specify additional alternatives that differ mainly in their usages. Traditionally various linguistic levels were used to deal with this problem: syntactic, semantic and pragmatic. Computationally the syntactic methods are the easiest, but are of no avail in the frequent situation when the different senses of the word show *This research was partially supported by grant number 120-741 of the Iarael Council for Research and Development the same syntactic behavior, having the same part of speech and even the same subcategorization frame. Substantial application of semantic or pragmatic knowledge about the word and its context for broad domains requires compiling huge amounts of knowledge, whose usefulness for practical applications has not yet been proven (Lenat et al., 1990; Nirenburg et al., 1988; Chodorow et al., 1985). Moreover, such methods fail to reflect word usages. It is known for many years that the use of a word in the language provides information about its meaning (Wittgenstein, 1953). Also, statistical approaches which were popular few decades ago have recently reawakened and were found useful for computational linguistics. Consequently, a possible (though partial) alternative to using manually constructed knowledge can be found in the use of statistical data on the occurrence of lexical relations in large corpora. The use of such relations (mainly relations between verbs or nouns and their arguments and modifiers) for various purposes has received growing attention in recent research (Church and Hanks, 1990; Zernik and Jacobs, 1990; Hindle, 1990). More specifically, two recent works have suggested to use statistical data on lexical relations for resolving ambiguity cases of PP-attachment (Hindle and Rooth, 1990) and pronoun references (Dagan and Itai, 1990a; Dagan and Itai, 1990b). Clearly, statistical methods can be useful also for target word selection. Consider, for example, the Hebrew sentence extracted from the foreign news section of the daily Haaretz, September 1990 (transcripted to Latin letters).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Influence of Sociological Factors on Usage of Mazandarani Language among the Youth

In this research, it has been attempted to determine the social role of two languages, Persian and Mazandarani languages ​​in Qaemshahr and their influence on young people on the use of these linguistic species. In societies with more than one language, we see the collision of languages ​​in various forms. In other words, some consequences of this collision of language cause the loss of the imp...

متن کامل

Who Is a Bilingual?

The question of who is and who is not a bilingual is more difficult to answer than it first appears. Bilingualism was long regarded as the equal mastery of two languages, a definition that still prevails in certain glossaries of linguistics. However, today's complex world requires a more exact definition and analysis of the competencies that community members require to interact with speakers o...

متن کامل

Evaluation of ten SNP Markers for Human Identification and Paternity Analysis in Persian Population

Background: DNA markers are inevitable tools of human identification in forensic science. Single Nucleotide Polymorphisms (SNPs) are one category of these markers which is concerned to use especially in the case of degraded DNA because of their short amplicons. Objectives: Detection of highly informative SNPs by the criteria is the essential step to devel...

متن کامل

Dispersive Ordering and k-out-of-n Systems

Extended Abstract. The simplest and the most common way of comparing two random variables is through their means and variances. It may happen that in some cases the median of X is larger than that of Y, while the mean of X is smaller than the mean of Y. However, this confusion will not arise if the random variables are stochastically ordered. Similarly, the same may happen if one would like to ...

متن کامل

A Contrastive Study of Persian and English Written Discourse: Ellipsis in Realistic Novels

  This study aspires to examine the concept of ellipsis by comparing and contrasting English and Persian written texts. For this purpose, three Persian novels and three English ones were selected. These novels were analyzed carefully; they were compared and contrasted for types and amount of ellipsis used, through a Chi-square analysis.  The results of the data analysis revealed that various t...

متن کامل

Stuttering Prevalence among Kurdish-Farsi Students Effects of the Two Languages Similarities

Objectives: It has been noted that stuttering is more prevalent in bilinguals than in monolinguals. The similarities of the languages involved have been mentioned to justify the difference between stuttering prevalence among bilingual and monolingual speakers. The aim of this study is to investigate the effect of language similarities on prevalence of stuttering among Kurdish-Farsi bilingual st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1991